该药物发现​​和开发过程是一个漫长而昂贵的过程,每次药物平均耗资超过10亿美元,需要10 - 15年的时间。为了减少在整个过程中的高水平流失量,在最近十年中,越来越多地将机器学习方法应用于药物发现和发育的各个阶段,尤其是在最早鉴定可药物疾病基因的阶段。在本文中,我们开发了一种新的张量分解模型,以预测用于治疗疾病的潜在药物靶标(基因或蛋白质)。我们创建了一个三维数据张量,该数据张量由1,048个基因靶标,860个疾病和230,0111111111111111111111111111111的证据属性和临床结果,并使用从开放式目标和药物数据库中提取的数据组成。我们用从药物发现的知识图中学到的基因目标表示丰富了数据,并应用了我们提出的方法来预测看不见的基因靶标和疾病对的临床结果。我们设计了三种评估策略来衡量预测性能,并将几个常用的机器学习分类器与贝叶斯矩阵和张量分解方法进行了基准测试。结果表明,合并知识图嵌入可显着提高预测准确性,并与密集的神经网络一起训练张量分解优于所有其他基线。总而言之,我们的框架结合了两种积极研究的机器学习方法,用于疾病目标识别,即张量分解和知识图表示学习,这可能是在数据驱动的药物发现中进一步探索的有希望的途径。
translated by 谷歌翻译
药物发现和发展是一个复杂和昂贵的过程。正在研究机器学习方法,以帮助提高药物发现管道多个阶段的有效性和速度。其中,使用知识图表(kg)的那些在许多任务中具有承诺,包括药物修复,药物毒性预测和靶基因疾病优先级。在药物发现kg中,包括基因,疾病和药物在内的关键因素被认为是实体,而它们之间的关系表示相互作用。但是,为了构建高质量的KG,需要合适的数据。在这篇综述中,我们详细介绍了适用于构建聚焦KGS的药物发现的公开使用来源。我们的目标是帮助引导机器学习和kg从业者对吸毒者发现领域应用新技术,但是谁可能不熟悉相关的数据来源。通过严格的标准选择数据集,根据包含内部包含的主要信息类型,并基于可以提取的信息来进行分类以构建kg。然后,我们对现有的公共药物发现KGS进行了比较分析,并评估了文献中所选择的激励案例研究。此外,我们还提出了众多和与域及其数据集相关的众多挑战和问题,同时突出了关键的未来研究方向。我们希望本综述将激励KGS在药物发现领域的关键和新兴问题中使用。
translated by 谷歌翻译
The Government of Kerala had increased the frequency of supply of free food kits owing to the pandemic, however, these items were static and not indicative of the personal preferences of the consumers. This paper conducts a comparative analysis of various clustering techniques on a scaled-down version of a real-world dataset obtained through a conjoint analysis-based survey. Clustering carried out by centroid-based methods such as k means is analyzed and the results are plotted along with SVD, and finally, a conclusion is reached as to which among the two is better. Once the clusters have been formulated, commodities are also decided upon for each cluster. Also, clustering is further enhanced by reassignment, based on a specific cluster loss threshold. Thus, the most efficacious clustering technique for designing a food kit tailored to the needs of individuals is finally obtained.
translated by 谷歌翻译
Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.
translated by 谷歌翻译
Linguists distinguish between novel and conventional metaphor, a distinction which the metaphor detection task in NLP does not take into account. Instead, metaphoricity is formulated as a property of a token in a sentence, regardless of metaphor type. In this paper, we investigate the limitations of treating conventional metaphors in this way, and advocate for an alternative which we name 'metaphorical polysemy detection' (MPD). In MPD, only conventional metaphoricity is treated, and it is formulated as a property of word senses in a lexicon. We develop the first MPD model, which learns to identify conventional metaphors in the English WordNet. To train it, we present a novel training procedure that combines metaphor detection with word sense disambiguation (WSD). For evaluation, we manually annotate metaphor in two subsets of WordNet. Our model significantly outperforms a strong baseline based on a state-of-the-art metaphor detection model, attaining an ROC-AUC score of .78 (compared to .65) on one of the sets. Additionally, when paired with a WSD model, our approach outperforms a state-of-the-art metaphor detection model at identifying conventional metaphors in text (.659 F1 compared to .626).
translated by 谷歌翻译
A widely acknowledged shortcoming of WordNet is that it lacks a distinction between word meanings which are systematically related (polysemy), and those which are coincidental (homonymy). Several previous works have attempted to fill this gap, by inferring this information using computational methods. We revisit this task, and exploit recent advances in language modelling to synthesise homonymy annotation for Princeton WordNet. Previous approaches treat the problem using clustering methods; by contrast, our method works by linking WordNet to the Oxford English Dictionary, which contains the information we need. To perform this alignment, we pair definitions based on their proximity in an embedding space produced by a Transformer model. Despite the simplicity of this approach, our best model attains an F1 of .97 on an evaluation set that we annotate. The outcome of our work is a high-quality homonymy annotation layer for Princeton WordNet, which we release.
translated by 谷歌翻译
当应用于自动驾驶汽车设置时,行动识别可以帮助丰富环境模型对世界的理解并改善未来行动的计划。为了改善自动驾驶汽车决策,我们在这项工作中提出了一种新型的两阶段在线行动识别系统,称为RADAC。RADAC提出了主动剂检测的问题,并在直接的两阶段管道中以进行动作检测和分类的直接识别人类活动识别中的参与者关系的想法。我们表明,我们提出的计划可以胜过ICCV2021 ROAD挑战数据集上的基线,并通过将其部署在真实的车辆平台上,我们演示了对环境中代理行动的高阶理解如何可以改善对真实自动驾驶汽车的决策。
translated by 谷歌翻译
历史上,轨迹计划和控制已分为自动驾驶堆栈中的两个模块。轨迹计划的重点是更高级别的任务,例如避免障碍物并保持在路面上,而控制器则尽最大努力遵循有史以来不断变化的参考轨迹。我们认为,由于计划中的轨迹与控制器可以执行的内容不匹配,因此这种分离是有缺陷的,并且(2)由于模型预测性控制(MPC)范式的灵活性而不必要。取而代之的是,在本文中,我们提出了一个基于统一的MPC轨迹计划和控制计划,该计划可确保在道路边界,静态和动态环境方面的可行性,并实施乘客舒适性限制。在各种方案中,对该方案进行了严格的评估,这些方案旨在证明最佳控制问题(OCP)设计和实时解决方案方法的有效性。原型代码将在https://github.com/watonomous/control上发布。
translated by 谷歌翻译
我们挑战AI模型,以“展示”对《纽约客》标题比赛的复杂多模式幽默的理解。具体而言,我们开发了三个精心限制的任务,以掌握图像和标题之间的潜在复杂和意外的关系,并且对人类经验的广泛品种产生了复杂和意外的寓意;这些是纽约口径卡通的标志。我们调查了直接将卡通像素和字幕输入的视觉和语言模型,以及仅通过提供图像的文本描述来规避图像处理的仅限语言模型。即使我们为卡通图像提供了丰富的多方面注释,我们也可以确定高质量的机器学习模型(例如,微调,175b参数语言模型)和人类之间的性能差距。我们公开发布我们的语料库,包括描述图像的位置/实体的注释,场景的不寻常以及对笑话的解释。
translated by 谷歌翻译
对现实世界的高质量观​​察对于各种应用至关重要,包括生产小型场景的3D印刷复制品以及对大型基础设施进行检查。这些3D观察通常是通过从不同观点组合多个传感器测量结果来获得的。指导选择合适的视图被称为下一个最佳视图(NBV)计划问题。大多数NBV都使用刚性数据结构(例如表面网格或体素电网)进行测量的原因。这简化了下一个最佳视图选择,但可以在计算上昂贵,减少现实世界的保真度,并与最终数据处理一起选择下一个最佳视图。本文介绍了表面边缘资源管理器(请参阅),这是一种NBV方法,该方法直接从先前的传感器测量中选择了新的观测值,而无需刚性数据结构。请参阅使用测量密度,以提出下一个最佳视图,以增加观察到的表面不足的覆盖范围,同时避免潜在的遮挡。模拟实验的统计结果表明,与在小型和大型场景上评估的体积方法相比,SEE可以在更少的计算时间和传感器行进距离中获得更好的表面覆盖范围。现实世界实验证明了使用固定在机器人臂上的3D传感器自主观察鹿雕像。
translated by 谷歌翻译